因变量的结果为“是/否”二值变量时, 可使用逻辑回归 (Logistic regression), 或称 logit 回归,通过计算输出为“是”或“否”的概率与自变量之间的线性关系, 进行回归分析, 其模型如下: $$ \operatorname{logit}(p)=\ln \left(\frac{p}{1-p}\right)=\alpha+\sum_i \beta_i x_i=y^{\prime} $$
该模型中, $p$ 是输出为 “是” 的概率值, $1-p$ 是输出为 “否”的概率值, 根据线性方程和 自变量的数值可以得到 $\operatorname{logit}(p)$ 值, 进而得到 $p$ 值,
$p=\frac{1}{1+\mathrm{e}^{-y^{\prime}}}$, 这是 sigmoid 函数。 使用广义线性模型 glm()
函数实现逻辑回归分析, 该函数可以拟合多种回归模型。
例 著名的电影 Titanic 展示了人类在灾难来临前的选择, 让妇女和儿童优先得到救助。本例中的数字来自 1912 年真实的 Titanic 灾难事件, 收集了11309 名乘客的年龄、 性别、舱位等级,以及是否生存的数据。数据来自 carData包中的 TitanicSurvival 数据集。
library(carData)
logit.TS <- glm(survived~sex+age+factor(passengerClass),family = binomial,data = TitanicSurvival)
summary(logit.TS)
结果
Call:
glm(formula = survived ~ sex + age + factor(passengerClass),
family = binomial, data = TitanicSurvival)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.6399 -0.6979 -0.4336 0.6688 2.3964
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.522074 0.326702 10.781 < 2e-16 ***
sexmale -2.497845 0.166037 -15.044 < 2e-16 ***
age -0.034393 0.006331 -5.433 5.56e-08 ***
factor(passengerClass)2nd -1.280570 0.225538 -5.678 1.36e-08 ***
factor(passengerClass)3rd -2.289661 0.225802 -10.140 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1414.62 on 1045 degrees of freedom
Residual deviance: 982.45 on 1041 degrees of freedom
(263 observations deleted due to missingness)
AIC: 992.45
Number of Fisher Scoring iterations: 4
在例 中, 广义线性模型是 formula $=$ survived $\sim$ sex $+$ age $+$ factor (passengerClass), 研究生存状态 (survived) 与性别 (sex)、年龄 (age) 和舱位等级 (passengerClass)之间的关系, survived 为二值变量“yes/no”, sex 也为二值变量“female/ male”, 年龄从 $0.1667$ 名到 80 岁, 等级分 3 等。等级作为分类变量创建哑变量。在广义线 性模型中 family 参数选择 “binomial”, 二值的输出结果就意味着遵循二项分布, 选择 logit回归的方法。 该数据中有 263 个样本存在缺失数据, 在分析时自动去除,结果表明生存与否与性别、 年龄和舱位等级都存在极显著的相关性,该数据分析结果也体现了人性伟大的一面。
摘自:
library(carData)
logit.TS <- glm(survived~sex+age+factor(passengerClass),family = binomial,data = TitanicSurvival)
summary(logit.TS)
Call: glm(formula = survived ~ sex + age + factor(passengerClass), family = binomial, data = TitanicSurvival) Deviance Residuals: Min 1Q Median 3Q Max -2.6399 -0.6979 -0.4336 0.6688 2.3964 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 3.522074 0.326702 10.781 < 2e-16 *** sexmale -2.497845 0.166037 -15.044 < 2e-16 *** age -0.034393 0.006331 -5.433 5.56e-08 *** factor(passengerClass)2nd -1.280570 0.225538 -5.678 1.36e-08 *** factor(passengerClass)3rd -2.289661 0.225802 -10.140 < 2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 1414.62 on 1045 degrees of freedom Residual deviance: 982.45 on 1041 degrees of freedom (263 observations deleted due to missingness) AIC: 992.45 Number of Fisher Scoring iterations: 4
TitanicSurvival
survived | sex | age | passengerClass | |
---|---|---|---|---|
Allen, Miss. Elisabeth Walton | yes | female | 29.0000 | 1st |
Allison, Master. Hudson Trevor | yes | male | 0.9167 | 1st |
Allison, Miss. Helen Loraine | no | female | 2.0000 | 1st |
Allison, Mr. Hudson Joshua Crei | no | male | 30.0000 | 1st |
Allison, Mrs. Hudson J C (Bessi | no | female | 25.0000 | 1st |
Anderson, Mr. Harry | yes | male | 48.0000 | 1st |
Andrews, Miss. Kornelia Theodos | yes | female | 63.0000 | 1st |
Andrews, Mr. Thomas Jr | no | male | 39.0000 | 1st |
Appleton, Mrs. Edward Dale (Cha | yes | female | 53.0000 | 1st |
Artagaveytia, Mr. Ramon | no | male | 71.0000 | 1st |
Astor, Col. John Jacob | no | male | 47.0000 | 1st |
Astor, Mrs. John Jacob (Madelei | yes | female | 18.0000 | 1st |
Aubart, Mme. Leontine Pauline | yes | female | 24.0000 | 1st |
Barber, Miss. Ellen Nellie | yes | female | 26.0000 | 1st |
Barkworth, Mr. Algernon Henry W | yes | male | 80.0000 | 1st |
Baumann, Mr. John D | no | male | NA | 1st |
Baxter, Mr. Quigg Edmond | no | male | 24.0000 | 1st |
Baxter, Mrs. James (Helene DeLa | yes | female | 50.0000 | 1st |
Bazzani, Miss. Albina | yes | female | 32.0000 | 1st |
Beattie, Mr. Thomson | no | male | 36.0000 | 1st |
Beckwith, Mr. Richard Leonard | yes | male | 37.0000 | 1st |
Beckwith, Mrs. Richard Leonard | yes | female | 47.0000 | 1st |
Behr, Mr. Karl Howell | yes | male | 26.0000 | 1st |
Bidois, Miss. Rosalie | yes | female | 42.0000 | 1st |
Bird, Miss. Ellen | yes | female | 29.0000 | 1st |
Birnbaum, Mr. Jakob | no | male | 25.0000 | 1st |
Bishop, Mr. Dickinson H | yes | male | 25.0000 | 1st |
Bishop, Mrs. Dickinson H (Helen | yes | female | 19.0000 | 1st |
Bissette, Miss. Amelia | yes | female | 35.0000 | 1st |
Bjornstrom-Steffansson, Mr. Mau | yes | male | 28.0000 | 1st |
... | ... | ... | ... | ... |
Vestrom, Miss. Hulda Amanda Ado | no | female | 14.0 | 3rd |
Vovk, Mr. Janko | no | male | 22.0 | 3rd |
Waelens, Mr. Achille | no | male | 22.0 | 3rd |
Ware, Mr. Frederick | no | male | NA | 3rd |
Warren, Mr. Charles William | no | male | NA | 3rd |
Webber, Mr. James | no | male | NA | 3rd |
Wenzel, Mr. Linhart | no | male | 32.5 | 3rd |
Whabee, Mrs. George Joseph (Sha | yes | female | 38.0 | 3rd |
Widegren, Mr. Carl/Charles Pete | no | male | 51.0 | 3rd |
Wiklund, Mr. Jakob Alfred | no | male | 18.0 | 3rd |
Wiklund, Mr. Karl Johan | no | male | 21.0 | 3rd |
Wilkes, Mrs. James (Ellen Needs | yes | female | 47.0 | 3rd |
Willer, Mr. Aaron (Abi Weller | no | male | NA | 3rd |
Willey, Mr. Edward | no | male | NA | 3rd |
Williams, Mr. Howard Hugh Harr | no | male | NA | 3rd |
Williams, Mr. Leslie | no | male | 28.5 | 3rd |
Windelov, Mr. Einar | no | male | 21.0 | 3rd |
Wirz, Mr. Albert | no | male | 27.0 | 3rd |
Wiseman, Mr. Phillippe | no | male | NA | 3rd |
Wittevrongel, Mr. Camille | no | male | 36.0 | 3rd |
Yasbeck, Mr. Antoni | no | male | 27.0 | 3rd |
Yasbeck, Mrs. Antoni (Selini Al | yes | female | 15.0 | 3rd |
Youseff, Mr. Gerious | no | male | 45.5 | 3rd |
Yousif, Mr. Wazli | no | male | NA | 3rd |
Yousseff, Mr. Gerious | no | male | NA | 3rd |
Zabour, Miss. Hileni | no | female | 14.5 | 3rd |
Zabour, Miss. Thamine | no | female | NA | 3rd |
Zakarian, Mr. Mapriededer | no | male | 26.5 | 3rd |
Zakarian, Mr. Ortin | no | male | 27.0 | 3rd |
Zimmerman, Mr. Leo | no | male | 29.0 | 3rd |